\section{Electromigration Enhancement Design}
Since EM influences the standard cell power rail most, we aim at reducing EM on power rails with AC stress self-healing. We change the topology of power networks to produce balanced bidirectional current on power rails. 

\subsection{Design Mechanism}
An IC chip may have a complex  power grid structure, but they can be divided recursively to the power ring structure. Consequently, our baseline design is a structure with a power ring and an array of standard cell rails. 

Our mechanism is to apply a vertical power/ground (P/G) strip in the middle of the layout, which uses a different metal layer from the P/G ring as shown in Figure~\ref{fig:1}. This additional strip is called as compensation power strip, which has similar width with the P/G ring. This strip is connected to each standard cell power rail but is disconnected with the P/G ring. 

\begin{figure}
\centering
\includegraphics[width=0.45\textwidth]{figure/2}
\caption{\small{(A) A vertical Power/Ground strip (compensation strip) is added in the middle of the layout with two working modes (\textit{normal mode:} power is supplied to the block from the P/G ring with the compensation strip in
high-impedance state; \textit{compensation mode:} the PAD supplies the
compensation strip, with the regular P/G ring in high-impedance state); (B) chip layout divided into regular or irregular sizes with power grid.}}
\label{fig:1}
\end{figure}

There are two operation modes in this design: the normal mode and the
compensation mode. The current flow directions on power rails are
shown in Figure~\ref{fig:1}A. Both modes are driven by the same set of
PADs to prevent PAD number increase. In the normal mode, power is
supplied to the block from the P/G ring. The transistors connecting
PADs and the compensation strip are off, thus the strip is in
high-impedance state. In the compensation mode, the PAD supplies the
compensation strip, and the P/G ring is in high-impedance state. If a
block is too big to meet the IR drop requirement, it can be divided
into 4 no-strip blocks with their compensation strips connected together as illustrated in the regular size division of Figure~\ref{fig:1}B. The four blocks switch into the normal or the compensation mode simultaneously. Irregular size division can also be applied (Figure~\ref{fig:1}B). All the power rings of different blocks are connected together, and all the compensation strips are connected to form a new power compensation network.

\subsection{Design Consideration}
In the circuit implementation, several facts should be considered. The package plan, the switching performance overhead, and control gating overhead are essential to ensure the design to fit in all situations with minimum performance overhead. 

\begin{figure}
\centering
\includegraphics[width=0.5\textwidth]{figure/figure_c}
\caption{\small{schematic of inverter chain under $control$ and $\overline{control}$ signal gated power supply}}
\label{fig:c}
\end{figure}

\subsubsection{Package plan influence on power grids}
Two widely-used chip package methods are wire-bonding and flip-chip. For
wire-bonding method, all the input signals including power supply sources are from
the four edges of the chip. For flip-chip method, the die is placed
upside down. Solder balls attach the on-die PADs to the surface of the
package. These on-die PADs are not limited to the periphery. However,
there is still a minimum PAD pitch requirement, which is about 20
PADs/mm~\cite{Jing:PADdensity}. If half of the PADs are used for power supply, the
distance between two power PADs is 200 $um$, and the same for ground
PAD. Similar power grid spacing is designed to ensure reasonable IR
drop for both package plans. Thus our proposed compensation
grids should comply with these spacing constraint.

\subsubsection{Switching performance overhead}
Under the power grid spacing requirement, we investigated the power supply switching of the most power hungry circuit type - inverter chain. The schematic of the structure is shown in Figure~\ref{fig:c}. The P/G supply is at the two ends of the inverter chain. Signals $control$ and $\overline{control}$ determine the on and off of power gating transistors. Signal integrity of output nodes during $control$ switching is a major concern. 

The healing effect requires an AC frequency above 20 kHz for copper~\cite{Jing:add}. It is safe to use a 100 kHz switching frequency for the $control$ signal. The circuit frequency is around 1 GHz, then $control$ switches every 10k cycles. 

We use an example circuits of a 128-inverter chain with 260 $um$ power rail length under 130nm technology to evaluate the performance overhead. Simulation results show that non-overlap $control$ and $\overline{control}$ can result in 10\% latency overhead for rising edge  and 4\% latency overhead for falling edge. Having both $control$ and $\overline{control}$ on for one additional cycle eliminates performance degradation at switching with 0.1\% overlap. 

\subsubsection{Sizing the power-gating transistor}
The size of the gating transistor determines the maximum current that can pass through it. However, larger transistors consume more chip area. For the 128-inverter chain, the gate size for $control$ switching transistors should be above 3 $um$ to achieve minimum performance impact and above 2 $um$ to make the circuit functional under 130nm technology node. 

\subsection{Optimize MTTF with EDA tools}

The proposed mechanism is based on the principle of applying
bidirectional current, but fully balanced AC stress at all nodes is
not practical. Even if the current is balanced, EM still cannot be
fully healed, because the EM damage is an accumulated effect of ion
moving probability at a certain location at every time instance. If
the current already shifts grain boundaries, this ion moving
probability cannot revert back to its initial state under reversed current. Therefore, AC plus DC model is applied to estimate the best EM MTTF~\cite{Tao1998295} under an unbalanced situation.

The healing effectiveness $\gamma$ is described as:
\begin{equation}
\gamma =1-2\left ( \frac{f_{0}}{f} \right )^{1/n}
\end{equation} 
where $f_0$ can be described as:
\begin{equation}
\frac{1}{2f_{0}}=\frac{A}{\overline{J_{DC}}^{n}}e^{E_a/kT}=MTTF_{DC}
\end{equation} 

The higher the frequency is, the closer $\gamma$ will approach to one. The current duty ratio $r$ modifies the overall AC MTTF as:
\begin{equation}
MTTF_{AC}=\frac{A}{{r{J_{+}}-\gamma (1-r){J_{-}}}^{n}}e^{E_a/KT}
\end{equation} 
$J_+$ and $J_-$ stands for the current density in opposite directions.

We propose two approaches to optimize the EM MTTF. 

\begin{figure}[b]
\centering
\includegraphics[width=0.48\textwidth]{figure/compare}
\caption{\small{Approach 1 and Approach 2 simulation results.}}
\vspace{12pt}
\label{fig:compare}
\end{figure}

%\begin{itemize}
\noindent$\bullet$\textbf{Approach 1:} Change the compensation power strip locations, while keeping the duty ratio $r$ of $control$ signal at $50\%$. This method provides better MTTF and keeps the control logic simple. However, there are many blocks within a chip, such that a large number of compensation strip locations are required to be determined. Changing the strip locations to find the optimal solution will lead to repeated re-place and re-route, which increases the total design time significantly.

\noindent$\bullet$\textbf{Approach 2:} Change the duty ratio $r$ of AC stress, while fixing the compensation power strip in the middle of the power ring. The MTTF of the whole chip is a continuous function of $r$. However, this function is not derivable because it is a piecewise function constructed by choosing the worst single nodes' MTTF($r$). Thus, EDA tools cannot derive the duty ratio for best MTTF. Sweeping $r$ should be a time efficient algorithm. The step size of sweeping depends on the preciseness requirement of the MTTF optimization. In order not to increase design time too long and to keep the control logic simple, we suggest sweeping no more than 16 points from 40\% to 60\% of $r$.


We use an example benchmark circuit to evaluate the effectiveness of these two approaches(The simulations are based on a ${554}\times{554}\;um^2$ MUL unit
using the $130nm$ Globalfoundry technology at $25^{o}C$). For
approach 1, the strip offset is from 40\% to 60\% at a 2\% footstep,
and the design is re-layouted to observe MTTF (the detailed MTTF
calculation method is discussed in section E). The strip in the middle
is defined as having 50\% offset and the MTTF at that point is
normalized to 1. For approach 2, the duty cycle is from 40\% to 60\%
at 2\% footstep. The MTTF at 50\% duty ratio is normalized to 1. The
MTTF results are shown in Figure~\ref{fig:compare}. It shows that the
best MTTF of approach 1 have a similar value to the MTTF when placing
the compensation strip in the middle of the chip. Meanwhile, the
maximum MTTF in approach 2 is about two times the 50\% duty cycle
design. Consequently, we can conclude that the optimization for MTTF
should place the compensation strip in the middle of the layout and
sweep the duty cycle ratio for best MTTF, with an algorithm as
follows:  

\begin{algorithm}[h]
\caption{The EM aware design flow}
\label{alg1}
\begin{algorithmic}
\REQUIRE User inputs the power rings and strips information
\ENSURE Meeting the user power rings/strips requirements and form the compensation network. Find best duty ratio of the control signal to optimize the MTTF. 
\FORALL {blocks}
\STATE make power ring/strips, avoid 1-2 grain size wire width
\STATE make compensation network and connect the compensation network together. 
\ENDFOR
\FOR{$r$=0.4 to 0.6 $step$=0.02}
    \STATE Find $MTTF(r)$
    \IF {$MTTF(r)\leq {temp}$}
        \STATE temp  = $MTTF(r)$
        \STATE $r_{opt} = r$
    \ENDIF
\ENDFOR
\STATE Generate the control circuit with $r_{opt}$
\end{algorithmic}
\end{algorithm}

\section{Results}
\subsection{Technical Detail}
Three different units from OpenSparcT1~\cite{Jing:sparc} are used to
verify the proposed EM healing method. These units are Floating Point
front-end Unit (FFU), Multiplex Unit (MUL) and Stream Processing Unit (SPU). They are chosen because they exhibit different functionality and have reasonable sizes.

The technology libraries used in this paper are the 130nm Globalfoundry process with 1.5 V supply voltage and the 45nm NCSU FreePDK process with 1.1 V supply voltage. The simulations were performed with the ambient temperature of $25^{o}C$, and the on-chip temperature of $55^{o}C$. 

\begin{figure}[t]
\centering
\includegraphics[width=0.45\textwidth]{figure/new4in1}
\caption{\small{IR drop and EM damage in an EM compensation design. Red parts are the
    most severe parts. (A) IR drop in the normal mode. Power is supplied
    from the four corners of the chip; (B) IR drop in the compensation
    mode. Power is supplied from the middle of the chip; (C) EM damage map in the normal mode; (D)
    EM damage map in the compensation mode.}}
\label{fig:layout1}
\end{figure}

\subsection{Experiment and Data Analysis}
Our experiments and comparison are based on four sets of setup. 
%\begin{itemize}
%\item

\noindent$\bullet$ The normal mode: the chip is driven by power ring. 

\noindent$\bullet$ The compensation mode: only the compensation strip drives the chip.

\noindent$\bullet$ The coarse bidirectional mode: half of the time the chip is driven by the power ring and another half by the compensation strip.

\noindent$\bullet$ The balanced bidirectional mode: the ratio of time driven by the power
ring and the compensation strip is modified to balance the current in
each directions.

Uni-directional current (DC) MTTF can be calculated from Black equation.  For an AC stress with different forward and backward current density, its MTTF is related to the DC MTTF as: (use M to stand for MTTF. n=1.1~\cite{Jing:ISCAEM})
\begin{equation}
(M_{AC})^{-\frac{1}{n}}= 
\frac{1}{2}(M _{DC,+})^{-\frac{1}{n}}-\frac{\gamma}{2}(M_{DC,-})^{-\frac{1}{n}}
\end{equation}

%\begin{equation}
%M_{AC}=
%\frac{1}{\frac{1}{2}((M_{DC,+})^{-\frac{1}{n}}-\frac{\gamma}{2}(M_{DC,-})^{-%\frac{1}{n}})^{n}}
%\end{equation}

In the balanced bidirectional method with optimized duty ratio $r$:
\begin{equation}
M_{AC}=\frac{1}{(r(M_{DC,+})^{-\frac{1}{n}}-(1-r)\gamma(M_{DC,-})^{-\frac{1}{n}})^{n}}
\end{equation}
The $control$ signal has 0.1\% overlap to prevent performance degradation, which is small and treated as no overlap during calculation. 

We use IR drop plot to determine the current directions. In
Figure~\ref{fig:layout1}A, B, the red color stands for high IR drop,
and the green color stands for low IR drop. The precise current
direction of every node is calculated according to the voltage
level. It shows that the current directions for the normal and the
compensation modes are roughly opposite. Although different input
patterns may change current direction at some locations, these
locations are safe to EM. The locations that have the most severe EM
issue do not change their current directions by input
patterns. Figure~\ref{fig:layout1}C, D show the most severe EM
locations in red. The goal is to improve EM MTTF at these
locations. An example of coarse and balance bidirectional mode EM
healing results for the FFU block is shown in
Figure~\ref{fig:layout2}. The previous lowest MTTF points are healed.

\begin{figure}
\centering
\includegraphics[width=0.45\textwidth]{figure/figure4_2d_org}
\caption{\small{EM healing results. (left) The coarse bidirectional mode result; (right) the balance bidirectional mode result.}}
\label{fig:layout2}
\end{figure}

\begin{figure}
\centering
\includegraphics[width=0.45\textwidth]{figure/mul2_org}
\caption{\small{The MUL layout at two technology nodes. (left) The 130nm process with only power ring; (right) the 45nm process with a normal power strip in purple and two compensation power strips in red.}}
\label{fig:mul2}
\end{figure}

The area utilization remains the same (70\%) for all the chips, but when the feature size shrinks, finer power grid division is used for the same design to maintain reasonable IR drop. 
%The minimum width of the metal layers shrinks, but it is not at the same pace as the channel length. Therefore, the area decrease is not as large as the feature size. The two technologies used in this study are from different foundries. The 130nm Globalfoundry process is more mature and gives better result in place and route. 
Figure~\ref{fig:mul2} compares the MUL unit design under two processes. For the 130nm process, MUL keeps the single power ring structure. A division into two blocks is used for 45nm process to meet IR drop requirements. Two compensation strips are applied and driven together in this case. The area overhead for all experiment cases are no more than 5.5\%(Table~\ref{tab:area}). Power grids will consume more area for smaller chips, but this trend is observed in all chip designs and is not an artifact of our mechanism. 

We compared the MTTF of these 3 functional blocks under four experiment
modes. The MTTF for different designs under the same technology and
temperature are shown in Figure~\ref{fig:result}. It can be observed
that for the normal mode (base-line design), the MTTF of these 3 designs are very close even though their physical design (floorplanning/placement/routing) are quite different.  %It shows that the MTTF of a design cannot increase beyond a certain level by re-floorplan and re-place \& route without applying the self-healing method.  
However, with the adding of compensation strip and bidirectional AC stress, MTTF can be improved dramatically. The highest increase is more than ten times, and the least MTTF improvement is around three times. Comparing the two schemes applied self-healing, the balanced mode can achieve better improvement than the coarse mode. This improvement variation is related to power density and placement. 

\begin{figure}
\centering
\includegraphics[width=0.5\textwidth]{figure/result}
\caption{\small{EM enhancement result for different design/technology node/temperature. The MTTF is based on hours.}}
\label{fig:result}
\end{figure}

\begin{table}\footnotesize
\begin{center}
\vspace{20pt}
\caption{Area (${um}\times{um}$) and the Overhead}~\label{tab:area}
\begin{tabular}{|c|c|c|c|c|}
\hline
tech & & spu & ffu & mul\\
\hline
$45nm$ & area & $980\times980$ & $837\times837$ & $418\times418$\\
\cline{2-5}
& overhead & 4\% & 4.70\% & 5.50\%\\
\hline
$130nm$ & area & $1310\times1310$ & $1100\times1100$ & $554\times554$\\
\cline{2-5}
& overhead & $3\%$ & $3.60\%$ & $4.50\%$\\
\hline
\end{tabular}
\vspace{-20pt}
\end{center}
\end{table}

\begin{figure}[t]
\centering
\includegraphics[width=0.47\textwidth]{figure/result2}
\caption{\small{EM MTTF improvement ratio for the coarse mode and the balance mode.}}
\label{fig:result2}
\end{figure}

The trend for temperature and technology scaling also follows the
theoretical analysis, as shown in Figure~\ref{fig:result}. For these
two technologies comparison (130nm versus 45nm), the EM difference is about 10 times for all
designs. The 45nm process is three generations smaller than the 130nm
process ($z=3$). This result is close to 9 times EM MTTF scaling
assumption ($z^2$). A 30 degrees rising in temperature decreases the
MTTF by ten times.  